Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 53730 |
| Missing cells | 305030 |
| Missing cells (%) | 29.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.4 MiB |
| Average record size in memory | 145.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 5 |
| BOOL | 5 |
type_of_sale has constant value "53730" | Constant |
subtype_of_property is highly correlated with type_of_property | High correlation |
type_of_property is highly correlated with subtype_of_property | High correlation |
price has 3330 (6.2%) missing values | Missing |
nr_of_rooms has 3161 (5.9%) missing values | Missing |
area has 11267 (21.0%) missing values | Missing |
equiped_kitchen has 20574 (38.3%) missing values | Missing |
furnished has 26752 (49.8%) missing values | Missing |
terrace has 24972 (46.5%) missing values | Missing |
terrace_area has 36354 (67.7%) missing values | Missing |
garden has 39390 (73.3%) missing values | Missing |
garden_area has 45601 (84.9%) missing values | Missing |
total_land_area has 25353 (47.2%) missing values | Missing |
nr_of_facades has 18256 (34.0%) missing values | Missing |
swimming_pool has 32150 (59.8%) missing values | Missing |
building_condition has 17870 (33.3%) missing values | Missing |
price is highly skewed (γ1 = 22.2881236) | Skewed |
nr_of_rooms is highly skewed (γ1 = 36.68169628) | Skewed |
terrace_area is highly skewed (γ1 = 76.34372105) | Skewed |
garden_area is highly skewed (γ1 = 71.7730675) | Skewed |
total_land_area is highly skewed (γ1 = 55.12911822) | Skewed |
id has unique values | Unique |
nr_of_rooms has 1206 (2.2%) zeros | Zeros |
total_land_area has 3514 (6.5%) zeros | Zeros |
Reproduction
| Analysis started | 2020-12-07 05:38:36.651730 |
|---|---|
| Analysis finished | 2020-12-07 05:39:08.699758 |
| Duration | 32.05 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 53730 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8832927.854 |
|---|---|
| Minimum | 1882546 |
| Maximum | 9066628 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1882546 |
|---|---|
| 5-th percentile | 8182803.45 |
| Q1 | 8801772 |
| median | 8948633.5 |
| Q3 | 9017599.75 |
| 95-th percentile | 9057278.55 |
| Maximum | 9066628 |
| Range | 7184082 |
| Interquartile range (IQR) | 215827.75 |
Descriptive statistics
| Standard deviation | 351575.5252 |
|---|---|
| Coefficient of variation (CV) | 0.03980282993 |
| Kurtosis | 36.28978961 |
| Mean | 8832927.854 |
| Median Absolute Deviation (MAD) | 84684 |
| Skewness | -4.526475785 |
| Sum | 4.745932136e+11 |
| Variance | 1.236053499e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 8785918 | 1 | < 0.1% | |
| 8899948 | 1 | < 0.1% | |
| 9012517 | 1 | < 0.1% | |
| 9014564 | 1 | < 0.1% | |
| 9024803 | 1 | < 0.1% | |
| 9026850 | 1 | < 0.1% | |
| 8889633 | 1 | < 0.1% | |
| 8891680 | 1 | < 0.1% | |
| 8807705 | 1 | < 0.1% | |
| 8809752 | 1 | < 0.1% | |
| Other values (53720) | 53720 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1882546 | 1 | < 0.1% | |
| 2335739 | 1 | < 0.1% | |
| 2784938 | 1 | < 0.1% | |
| 3001135 | 1 | < 0.1% | |
| 3702839 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9066628 | 1 | < 0.1% | |
| 9066556 | 1 | < 0.1% | |
| 9066527 | 1 | < 0.1% | |
| 9066417 | 1 | < 0.1% | |
| 9066399 | 1 | < 0.1% |
locality
Real number (ℝ≥0)
| Distinct | 1051 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5372.749079 |
|---|---|
| Minimum | 1000 |
| Maximum | 9992 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1000 |
|---|---|
| 5-th percentile | 1070 |
| Q1 | 2260 |
| median | 5580 |
| Q3 | 8450 |
| 95-th percentile | 9550 |
| Maximum | 9992 |
| Range | 8992 |
| Interquartile range (IQR) | 6190 |
Descriptive statistics
| Standard deviation | 3111.313225 |
|---|---|
| Coefficient of variation (CV) | 0.5790914817 |
| Kurtosis | -1.617409373 |
| Mean | 5372.749079 |
| Median Absolute Deviation (MAD) | 3040 |
| Skewness | -0.03945887581 |
| Sum | 288677808 |
| Variance | 9680269.981 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 8300 | 1268 | 2.4% | |
| 9000 | 910 | 1.7% | |
| 1180 | 842 | 1.6% | |
| 8400 | 814 | 1.5% | |
| 1000 | 809 | 1.5% | |
| 2000 | 617 | 1.1% | |
| 1050 | 582 | 1.1% | |
| 8370 | 551 | 1.0% | |
| 1070 | 520 | 1.0% | |
| 8670 | 472 | 0.9% | |
| Other values (1041) | 46345 | 86.3% |
| Value | Count | Frequency (%) | |
| 1000 | 809 | 1.5% | |
| 1020 | 172 | 0.3% | |
| 1030 | 418 | 0.8% | |
| 1040 | 217 | 0.4% | |
| 1050 | 582 | 1.1% |
| Value | Count | Frequency (%) | |
| 9992 | 7 | < 0.1% | |
| 9991 | 103 | 0.2% | |
| 9990 | 108 | 0.2% | |
| 9988 | 12 | < 0.1% | |
| 9982 | 11 | < 0.1% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 419.8 KiB |
| HOUSE | |
|---|---|
| APARTMENT | |
| APARTMENT_GROUP | 2341 |
| HOUSE_GROUP | 817 |
| Value | Count | Frequency (%) | |
| HOUSE | 28378 | 52.8% | |
| APARTMENT | 22194 | 41.3% | |
| APARTMENT_GROUP | 2341 | 4.4% | |
| HOUSE_GROUP | 817 | 1.5% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 15 |
|---|---|
| Median length | 5 |
| Mean length | 7.179192258 |
| Min length | 5 |
| Distinct | 25 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 419.8 KiB |
| HOUSE | |
|---|---|
| APARTMENT | |
| VILLA | |
| APARTMENT_GROUP | |
| DUPLEX | 1391 |
| Other values (20) |
| Value | Count | Frequency (%) | |
| HOUSE | 20685 | 38.5% | |
| APARTMENT | 16956 | 31.6% | |
| VILLA | 2954 | 5.5% | |
| APARTMENT_GROUP | 2341 | 4.4% | |
| DUPLEX | 1391 | 2.6% | |
| APARTMENT_BLOCK | 1244 | 2.3% | |
| GROUND_FLOOR | 1184 | 2.2% | |
| PENTHOUSE | 1027 | 1.9% | |
| MIXED_USE_BUILDING | 1008 | 1.9% | |
| HOUSE_GROUP | 817 | 1.5% | |
| Other values (15) | 4123 | 7.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 20 |
|---|---|
| Median length | 9 |
| Mean length | 7.981313977 |
| Min length | 3 |
| Distinct | 4640 |
|---|---|
| Distinct (%) | 9.2% |
| Missing | 3330 |
| Missing (%) | 6.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 396311.4335 |
|---|---|
| Minimum | 1 |
| Maximum | 35000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 115000 |
| Q1 | 200000 |
| median | 285000 |
| Q3 | 405000 |
| 95-th percentile | 995000 |
| Maximum | 35000000 |
| Range | 34999999 |
| Interquartile range (IQR) | 205000 |
Descriptive statistics
| Standard deviation | 523834.5236 |
|---|---|
| Coefficient of variation (CV) | 1.321774946 |
| Kurtosis | 1224.348396 |
| Mean | 396311.4335 |
| Median Absolute Deviation (MAD) | 95000 |
| Skewness | 22.2881236 |
| Sum | 1.997409625e+10 |
| Variance | 2.744026081e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 249000 | 670 | 1.2% | |
| 299000 | 622 | 1.2% | |
| 199000 | 622 | 1.2% | |
| 275000 | 595 | 1.1% | |
| 295000 | 578 | 1.1% | |
| 225000 | 544 | 1.0% | |
| 395000 | 505 | 0.9% | |
| 195000 | 503 | 0.9% | |
| 175000 | 494 | 0.9% | |
| 235000 | 473 | 0.9% | |
| Other values (4630) | 44794 | 83.4% | |
| (Missing) | 3330 | 6.2% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 65 | 1 | < 0.1% | |
| 2500 | 3 | < 0.1% | |
| 3500 | 1 | < 0.1% | |
| 4000 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 35000000 | 3 | < 0.1% | |
| 21600000 | 1 | < 0.1% | |
| 13500000 | 2 | < 0.1% | |
| 9500000 | 1 | < 0.1% | |
| 8750000 | 1 | < 0.1% |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 419.8 KiB |
| FOR_SALE |
|---|
| Value | Count | Frequency (%) | |
| FOR_SALE | 53730 | 100.0% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
| Distinct | 45 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 3161 |
| Missing (%) | 5.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.92669422 |
|---|---|
| Minimum | 0 |
| Maximum | 204 |
| Zeros | 1206 |
| Zeros (%) | 2.2% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 204 |
| Range | 204 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 2.606963675 |
|---|---|
| Coefficient of variation (CV) | 0.8907536896 |
| Kurtosis | 2493.272403 |
| Mean | 2.92669422 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 36.68169628 |
| Sum | 148000 |
| Variance | 6.7962596 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=45)
| Value | Count | Frequency (%) | |
| 3 | 16650 | 31.0% | |
| 2 | 15462 | 28.8% | |
| 4 | 7154 | 13.3% | |
| 1 | 5002 | 9.3% | |
| 5 | 2754 | 5.1% | |
| 0 | 1206 | 2.2% | |
| 6 | 1162 | 2.2% | |
| 7 | 435 | 0.8% | |
| 8 | 252 | 0.5% | |
| 9 | 124 | 0.2% | |
| Other values (35) | 368 | 0.7% | |
| (Missing) | 3161 | 5.9% |
| Value | Count | Frequency (%) | |
| 0 | 1206 | 2.2% | |
| 1 | 5002 | 9.3% | |
| 2 | 15462 | 28.8% | |
| 3 | 16650 | 31.0% | |
| 4 | 7154 | 13.3% |
| Value | Count | Frequency (%) | |
| 204 | 3 | < 0.1% | |
| 165 | 1 | < 0.1% | |
| 80 | 4 | < 0.1% | |
| 71 | 1 | < 0.1% | |
| 70 | 1 | < 0.1% |
| Distinct | 822 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 11267 |
| Missing (%) | 21.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 170.7482279 |
|---|---|
| Minimum | 1 |
| Maximum | 11366 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 57 |
| Q1 | 92 |
| median | 132 |
| Q3 | 200 |
| 95-th percentile | 404 |
| Maximum | 11366 |
| Range | 11365 |
| Interquartile range (IQR) | 108 |
Descriptive statistics
| Standard deviation | 170.3762744 |
|---|---|
| Coefficient of variation (CV) | 0.9978216261 |
| Kurtosis | 815.4907766 |
| Mean | 170.7482279 |
| Median Absolute Deviation (MAD) | 47 |
| Skewness | 17.71983869 |
| Sum | 7250482 |
| Variance | 29028.07487 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 100 | 873 | 1.6% | |
| 90 | 843 | 1.6% | |
| 120 | 816 | 1.5% | |
| 150 | 757 | 1.4% | |
| 110 | 706 | 1.3% | |
| 80 | 696 | 1.3% | |
| 140 | 668 | 1.2% | |
| 200 | 651 | 1.2% | |
| 85 | 631 | 1.2% | |
| 160 | 614 | 1.1% | |
| Other values (812) | 35208 | 65.5% | |
| (Missing) | 11267 | 21.0% |
| Value | Count | Frequency (%) | |
| 1 | 2 | < 0.1% | |
| 5 | 2 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 14 | 1 | < 0.1% | |
| 15 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 11366 | 1 | < 0.1% | |
| 8750 | 1 | < 0.1% | |
| 8521 | 1 | < 0.1% | |
| 6293 | 1 | < 0.1% | |
| 4380 | 1 | < 0.1% |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 20574 |
| Missing (%) | 38.3% |
| Memory size | 419.8 KiB |
| INSTALLED | |
|---|---|
| HYPER_EQUIPPED | |
| SEMI_EQUIPPED | |
| USA_HYPER_EQUIPPED | |
| NOT_INSTALLED | |
| Other values (3) | 1111 |
| Value | Count | Frequency (%) | |
| INSTALLED | 17750 | 33.0% | |
| HYPER_EQUIPPED | 6703 | 12.5% | |
| SEMI_EQUIPPED | 3537 | 6.6% | |
| USA_HYPER_EQUIPPED | 2138 | 4.0% | |
| NOT_INSTALLED | 1917 | 3.6% | |
| USA_INSTALLED | 887 | 1.7% | |
| USA_SEMI_EQUIPPED | 197 | 0.4% | |
| USA_UNINSTALLED | 27 | 0.1% | |
| (Missing) | 20574 | 38.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 18 |
|---|---|
| Median length | 9 |
| Mean length | 8.188814443 |
| Min length | 3 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 26752 |
| Missing (%) | 49.8% |
| Memory size | 419.8 KiB |
| False | |
|---|---|
| True | 1580 |
| (Missing) |
| Value | Count | Frequency (%) | |
| False | 25398 | 47.3% | |
| True | 1580 | 2.9% | |
| (Missing) | 26752 | 49.8% |
open_fire
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| False | |
|---|---|
| True | 2497 |
| Value | Count | Frequency (%) | |
| False | 51233 | 95.4% | |
| True | 2497 | 4.6% |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 24972 |
| Missing (%) | 46.5% |
| Memory size | 419.8 KiB |
| True | |
|---|---|
| (Missing) |
| Value | Count | Frequency (%) | |
| True | 28758 | 53.5% | |
| (Missing) | 24972 | 46.5% |
| Distinct | 213 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 36354 |
| Missing (%) | 67.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.13092772 |
|---|---|
| Minimum | 1 |
| Maximum | 20000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 9 |
| median | 16 |
| Q3 | 30 |
| 95-th percentile | 78 |
| Maximum | 20000 |
| Range | 19999 |
| Interquartile range (IQR) | 21 |
Descriptive statistics
| Standard deviation | 191.8688313 |
|---|---|
| Coefficient of variation (CV) | 6.586430516 |
| Kurtosis | 7174.296533 |
| Mean | 29.13092772 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 76.34372105 |
| Sum | 506179 |
| Variance | 36813.64841 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 10 | 1033 | 1.9% | |
| 20 | 1005 | 1.9% | |
| 15 | 809 | 1.5% | |
| 6 | 784 | 1.5% | |
| 12 | 749 | 1.4% | |
| 8 | 726 | 1.4% | |
| 30 | 667 | 1.2% | |
| 9 | 594 | 1.1% | |
| 25 | 570 | 1.1% | |
| 5 | 563 | 1.0% | |
| Other values (203) | 9876 | 18.4% | |
| (Missing) | 36354 | 67.7% |
| Value | Count | Frequency (%) | |
| 1 | 70 | 0.1% | |
| 2 | 313 | 0.6% | |
| 3 | 376 | 0.7% | |
| 4 | 551 | 1.0% | |
| 5 | 563 | 1.0% |
| Value | Count | Frequency (%) | |
| 20000 | 1 | < 0.1% | |
| 8000 | 2 | < 0.1% | |
| 6000 | 1 | < 0.1% | |
| 3500 | 1 | < 0.1% | |
| 3400 | 1 | < 0.1% |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 39390 |
| Missing (%) | 73.3% |
| Memory size | 419.8 KiB |
| True | |
|---|---|
| (Missing) |
| Value | Count | Frequency (%) | |
| True | 14340 | 26.7% | |
| (Missing) | 39390 | 73.3% |
| Distinct | 1192 |
|---|---|
| Distinct (%) | 14.7% |
| Missing | 45601 |
| Missing (%) | 84.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1093.024972 |
|---|---|
| Minimum | 1 |
| Maximum | 1134500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 70 |
| median | 185 |
| Q3 | 591 |
| 95-th percentile | 2869.2 |
| Maximum | 1134500 |
| Range | 1134499 |
| Interquartile range (IQR) | 521 |
Descriptive statistics
| Standard deviation | 13683.43463 |
|---|---|
| Coefficient of variation (CV) | 12.51886734 |
| Kurtosis | 5827.776013 |
| Mean | 1093.024972 |
| Median Absolute Deviation (MAD) | 145 |
| Skewness | 71.7730675 |
| Sum | 8885200 |
| Variance | 187236383.2 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 100 | 255 | 0.5% | |
| 50 | 201 | 0.4% | |
| 200 | 188 | 0.3% | |
| 300 | 148 | 0.3% | |
| 400 | 133 | 0.2% | |
| 60 | 128 | 0.2% | |
| 80 | 128 | 0.2% | |
| 150 | 128 | 0.2% | |
| 250 | 123 | 0.2% | |
| 40 | 123 | 0.2% | |
| Other values (1182) | 6574 | 12.2% | |
| (Missing) | 45601 | 84.9% |
| Value | Count | Frequency (%) | |
| 1 | 68 | 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 3 | < 0.1% | |
| 4 | 15 | < 0.1% | |
| 5 | 12 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1134500 | 1 | < 0.1% | |
| 312600 | 1 | < 0.1% | |
| 110000 | 1 | < 0.1% | |
| 94000 | 1 | < 0.1% | |
| 80978 | 1 | < 0.1% |
| Distinct | 3393 |
|---|---|
| Distinct (%) | 12.0% |
| Missing | 25353 |
| Missing (%) | 47.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1261.731015 |
|---|---|
| Minimum | 0 |
| Maximum | 850000 |
| Zeros | 3514 |
| Zeros (%) | 6.5% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 140 |
| median | 350 |
| Q3 | 838 |
| 95-th percentile | 3490.4 |
| Maximum | 850000 |
| Range | 850000 |
| Interquartile range (IQR) | 698 |
Descriptive statistics
| Standard deviation | 7884.917281 |
|---|---|
| Coefficient of variation (CV) | 6.249285458 |
| Kurtosis | 5086.635899 |
| Mean | 1261.731015 |
| Median Absolute Deviation (MAD) | 268 |
| Skewness | 55.12911822 |
| Sum | 35804141 |
| Variance | 62171920.53 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 3514 | 6.5% | |
| 150 | 216 | 0.4% | |
| 100 | 214 | 0.4% | |
| 200 | 178 | 0.3% | |
| 1000 | 173 | 0.3% | |
| 120 | 172 | 0.3% | |
| 300 | 169 | 0.3% | |
| 250 | 165 | 0.3% | |
| 180 | 149 | 0.3% | |
| 500 | 139 | 0.3% | |
| Other values (3383) | 23288 | 43.3% | |
| (Missing) | 25353 | 47.2% |
| Value | Count | Frequency (%) | |
| 0 | 3514 | 6.5% | |
| 1 | 24 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 850000 | 1 | < 0.1% | |
| 396300 | 1 | < 0.1% | |
| 250000 | 1 | < 0.1% | |
| 226952 | 1 | < 0.1% | |
| 220000 | 1 | < 0.1% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 18256 |
| Missing (%) | 34.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.76450358 |
|---|---|
| Minimum | 1 |
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 419.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 4 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 0.8632200106 |
|---|---|
| Coefficient of variation (CV) | 0.3122513629 |
| Kurtosis | -1.225925341 |
| Mean | 2.76450358 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.3848056267 |
| Sum | 98068 |
| Variance | 0.7451487866 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 2 | 17167 | 32.0% | |
| 4 | 9591 | 17.9% | |
| 3 | 8318 | 15.5% | |
| 1 | 395 | 0.7% | |
| 6 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 10 | 1 | < 0.1% | |
| (Missing) | 18256 | 34.0% |
| Value | Count | Frequency (%) | |
| 1 | 395 | 0.7% | |
| 2 | 17167 | 32.0% | |
| 3 | 8318 | 15.5% | |
| 4 | 9591 | 17.9% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 10 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 4 | 9591 | 17.9% | |
| 3 | 8318 | 15.5% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 32150 |
| Missing (%) | 59.8% |
| Memory size | 419.8 KiB |
| False | |
|---|---|
| True | 1132 |
| (Missing) |
| Value | Count | Frequency (%) | |
| False | 20448 | 38.1% | |
| True | 1132 | 2.1% | |
| (Missing) | 32150 | 59.8% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 17870 |
| Missing (%) | 33.3% |
| Memory size | 419.8 KiB |
| AS_NEW | |
|---|---|
| GOOD | |
| TO_BE_DONE_UP | |
| TO_RENOVATE | |
| JUST_RENOVATED |
| Value | Count | Frequency (%) | |
| AS_NEW | 14001 | 26.1% | |
| GOOD | 12812 | 23.8% | |
| TO_BE_DONE_UP | 3199 | 6.0% | |
| TO_RENOVATE | 3128 | 5.8% | |
| JUST_RENOVATED | 2524 | 4.7% | |
| TO_RESTORE | 196 | 0.4% | |
| (Missing) | 17870 | 33.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 14 |
|---|---|
| Median length | 4 |
| Mean length | 5.623580867 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| id | locality | type_of_property | subtype_of_property | price | type_of_sale | nr_of_rooms | area | equiped_kitchen | furnished | open_fire | terrace | terrace_area | garden | garden_area | total_land_area | nr_of_facades | swimming_pool | building_condition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9044081 | 1083 | APARTMENT | APARTMENT | 265000.0 | FOR_SALE | 4.0 | 90.0 | INSTALLED | False | False | True | 13.0 | NaN | NaN | NaN | 4.0 | NaN | AS_NEW |
| 1 | 9043978 | 1000 | APARTMENT | APARTMENT | 1795000.0 | FOR_SALE | 4.0 | 650.0 | USA_HYPER_EQUIPPED | False | True | True | 400.0 | NaN | NaN | NaN | 3.0 | NaN | AS_NEW |
| 2 | 9044188 | 1050 | HOUSE | MANSION | 3800000.0 | FOR_SALE | 5.0 | 752.0 | HYPER_EQUIPPED | False | False | True | 40.0 | True | NaN | 340.0 | 2.0 | NaN | JUST_RENOVATED |
| 3 | 9041095 | 4860 | HOUSE | HOUSE | 320000.0 | FOR_SALE | 5.0 | 231.0 | NOT_INSTALLED | False | False | True | 30.0 | True | 1200.0 | 1421.0 | 3.0 | False | AS_NEW |
| 4 | 9042175 | 1160 | APARTMENT_GROUP | APARTMENT_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 9041098 | 6001 | APARTMENT_GROUP | APARTMENT_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 6 | 9043036 | 9600 | APARTMENT | APARTMENT | 195000.0 | FOR_SALE | 2.0 | 75.0 | INSTALLED | NaN | False | NaN | NaN | NaN | NaN | NaN | 2.0 | NaN | GOOD |
| 7 | 9042950 | 6010 | APARTMENT | TRIPLEX | 235000.0 | FOR_SALE | 3.0 | 149.0 | HYPER_EQUIPPED | False | False | True | 15.0 | NaN | NaN | NaN | 2.0 | False | AS_NEW |
| 8 | 9042073 | 1070 | APARTMENT | APARTMENT | 320000.0 | FOR_SALE | 3.0 | 130.0 | USA_HYPER_EQUIPPED | False | False | True | 14.0 | NaN | NaN | NaN | 2.0 | NaN | AS_NEW |
| 9 | 9042267 | 7181 | HOUSE | VILLA | 325000.0 | FOR_SALE | 2.0 | 130.0 | INSTALLED | False | False | True | 30.0 | True | 600.0 | 1043.0 | 4.0 | False | TO_BE_DONE_UP |
Last rows
| id | locality | type_of_property | subtype_of_property | price | type_of_sale | nr_of_rooms | area | equiped_kitchen | furnished | open_fire | terrace | terrace_area | garden | garden_area | total_land_area | nr_of_facades | swimming_pool | building_condition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 53720 | 9020073 | 3020 | HOUSE | HOUSE | 411217.0 | FOR_SALE | 3.0 | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | 1000.0 | NaN | False | NaN |
| 53721 | 8968005 | 3040 | HOUSE | HOUSE | 413119.0 | FOR_SALE | 3.0 | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | 937.0 | NaN | False | NaN |
| 53722 | 8703267 | 3140 | HOUSE | HOUSE | 419322.0 | FOR_SALE | 3.0 | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | 730.0 | NaN | False | NaN |
| 53723 | 9027423 | 3020 | HOUSE | HOUSE | 442072.0 | FOR_SALE | 3.0 | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | 1000.0 | NaN | False | NaN |
| 53724 | 8714117 | 1860 | HOUSE_GROUP | HOUSE_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 53725 | 8667513 | 3080 | HOUSE_GROUP | HOUSE_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 53726 | 9020840 | 1785 | HOUSE | HOUSE | 417500.0 | FOR_SALE | 3.0 | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | 396.0 | 3.0 | NaN | NaN |
| 53727 | 8747154 | 3470 | HOUSE | COUNTRY_COTTAGE | 750000.0 | FOR_SALE | 3.0 | NaN | NaN | False | False | NaN | NaN | NaN | NaN | 0.0 | NaN | False | NaN |
| 53728 | 6992573 | 1500 | APARTMENT_GROUP | APARTMENT_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 53729 | 6918766 | 1500 | APARTMENT_GROUP | APARTMENT_GROUP | NaN | FOR_SALE | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |